巴西专利BR112015029289B1 SYSTEMS AND METHODS FOR VERIFICATION OF PROCEDURE RETURN ADDRESS

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
Procedure Return Address Checking Systems and Methods An exemplary processing system may comprise: a stack pointer configured to reference a first return address stored in a stack; a return address temporary storage pointer configured to reference a second return address stored in a return address store; and a return address check logic configured, in response to receipt of a return instruction, to compare the first return address to the second return address.
公开号:BR112015029289B1
申请号:R112015029289-5
申请日:2014-05-30
公开日:2022-01-11
发明作者:Gideon Gerzon；Jared W. Stark；Gal Diskin
申请人:Intel Corporation；
IPC主号:

专利说明:

FIELD OF THE INVENTION
[0001] The present disclosure pertains generally to computer systems, and pertains specifically to systems and methods for procedure return address verification. HISTORIC
[0002] Various return address corruption techniques can be employed by malicious software to carry out a return-oriented programming (ROP) attack. ROP is a method of hijacking the flow of execution of the current process by exploiting a return instruction which, in many processor architectures, retrieves from the top of the stack the address of the next instruction to be executed, which is normally the instruction that follows the corresponding calling instruction within the calling routine. In this way, by modifying the return address on the stack, an attacker can divert the flow of execution of the current process to an arbitrary memory location. Once the flow of execution has been hijacked, the attacker can, for example, initialize the arguments and perform a library function call. This technique is known as "back to library". In another example, the attacker can find within a segment of code several sequences of instructions to be executed. This approach is known as the "borrowed code blocks technique".
[0003] Various methods can be exploited by the attacker for initial stack corruption, which is also called "stack pivoting". For example, the temporary storage overflow method involves supplying more input data than the routine expects to receive, under the assumption that the input temporary storage is located on the stack. BRIEF DESCRIPTION OF THE DRAWINGS
[0004] The present disclosure is illustrated by way of example, not limitation, and may be better understood with reference to the following detailed description when considered in combination with the Figures, in which:
[0005] Figure 1 depicts a high-level component diagram of an exemplary computer system, in accordance with one or more aspects of the present disclosure;
[0006] Figure 2 depicts a block diagram of a processor in accordance with one or more aspects of the present disclosure;
[0007] Figures 3a to 3b schematically illustrate elements of a processor microarchitecture in accordance with one or more aspects of the present disclosure;
[0008] Figure 4 schematically illustrates various aspects of an exemplary processor and other components of the exemplary computer system 100 of Figure 1, in accordance with one or more aspects of the present disclosure;
[0009] Figure 5 schematically illustrates an example computer system stack memory layout in accordance with one or more aspects of the present disclosure;
[0010] Figure 6 schematically illustrates an example memory layout of the return address store, in accordance with one or more aspects of the present disclosure;
[0011] Figure 7 depicts a flow diagram of an exemplary method for detecting unauthorized stacking and pivoting, in accordance with one or more aspects of the present disclosure;
[0012] Figure 8 depicts a block diagram of an exemplary computer system, in accordance with one or more aspects of the present disclosure;
[0013] Figure 9 depicts a block diagram of an exemplary system on a chip (SoC) in accordance with one or more aspects of the present disclosure;
[0014] Figure 10 depicts a block diagram of an exemplary computer system in accordance with one or more aspects of the present disclosure; and
[0015] Figure 11 depicts a block diagram of an exemplary system on a chip (SoC) in accordance with one or more aspects of the present disclosure. DETAILED DESCRIPTION
[0016] This document describes computer systems and methods for procedure return address verification. Unauthorized stack modification, or pivoting, can be used by a potential attacker in an attempt to carry out a return-oriented programming (ROP) attack. The latter may involve unauthorized modification of a stored procedure return address on the stack in order to divert the flow of execution of the current process to an arbitrary memory location. Various methods can be exploited by the attacker for unauthorized stack modification. For example, the temporary storage overflow method involves supplying more input data than the routine expects to receive, under the assumption that the input temporary storage is located on the stack.
[0017] In order to prevent unauthorized stack modification, a computer system may maintain a return address store designed to redundantly store procedure return addresses along the computer system stack. In response to receiving a call instruction, a computer system processor may place the return address either on a stack or in return address storage. In response to receiving a return instruction, the processor may retrieve and compare return addresses from the stack and return address store. If the two addresses are compatible, the processor can continue executing the return instruction; otherwise, the processor may throw an exception, thus preventing a potential attacker from hijacking the current process's flow of execution. Various aspects of the methods and systems noted above are detailed below in this document by way of example, not limitation.
[0018] In the following description, numerous specific details are presented, such as examples of specific types of processor and system configurations, specific hardware structures, specific architecture and microarchitecture details, specific register settings, instruction types specific, specific system components, specific measurements/heights, specific processor piping stages and operation in order to provide a thorough understanding of the present disclosure. However, it will be apparent to the person skilled in the art that these specific details need not be employed to practice the methods disclosed herein. In other examples, well-known components or methods, such as specific or alternative processor architectures, specific logic circuits/code for described algorithms, specific firmware code, specific interconnect operation, specific logic configurations, specific manufacturing techniques and materials, Specific compiler deployments, specific expression of algorithms in code, specific disablement or blocking techniques/logics, and other computer system specific operational details have not been described in detail in order to avoid unnecessary complexity of the present disclosure.
[0019] While the following modalities are described with reference to a processor, other modalities are applicable to other types of integrated circuits and logic devices. Similar techniques and teachings from modalities described herein can be applied to other types of semiconductor circuits or devices that can benefit from increased piping productivity and improved performance. The teachings of modalities described herein are applicable to any processor or machine that performs data manipulations. However, the present disclosure is not limited to processors or machines that perform 512-bit, 256-bit, 128-bit, 64-bit, 32-bit or 16-bit data operations and can be applied to any processor and machine on which manipulation or data management is performed. In addition, the following description provides examples, and the accompanying drawings show various examples for the purposes of illustration. However, these examples should not be interpreted in a limiting manner, as they are intended only to provide examples of modalities described herein rather than to provide an exhaustive list of all possible implementations of modalities described herein.
[0020] While the examples below describe instruction handling and distribution in the context of execution units and logic circuits, other embodiments of the systems and methods described in this document may be realized through data or instructions stored on a machine-readable, tangible media. which, when performed by a machine, cause the machine to perform functions consistent with at least one embodiment described herein. In one embodiment, functions associated with embodiments described herein are incorporated into machine-executable instructions. The instructions can be used to cause a general-purpose or special-purpose processor that is programmed with the instructions to perform the steps of the methods described in this document. The embodiments described herein may be provided as a computer product or software which may include computer or machine readable media which instructions are stored thereon which can be used to program a computer (or other electronic devices) to perform one or more operations in accordance with modalities described herein. Alternatively, the modalities operations described herein may be performed by specific hardware components that contain fixed-function logic to perform the operations, or by any combination of programmed computer components and fixed-function hardware components.
[0021] The instructions used to program logic to perform the methods described in this document may be stored within a system memory, such as DRAM, cache, flash memory, or other storage. In addition, instructions may be distributed over a network or via other computer-readable media. Thus, machine-readable media may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., computer), but without limitation, floppy disks, optical disks, Read-Only Memory Compact Disk (CD) -ROMs) and magneto-optical disks, Read Only Memory (ROMs), Random Access Memory (RAM), Rewritable Programmable Read Only Memory (EPROM), Electrically Rewritable Programmable Read Only Memory (EEPROM), magnetic or optical cards , flash memory, or a machine-readable, tangible storage used in transmitting information over the Internet via electrical, acoustic signals or other forms of propagated signals (e.g. carrier waves, infrared signals, digital signals, etc.). Accordingly, computer-readable media includes any type of tangible machine-readable media suitable for storing or transmitting instructions or electronic information in a form readable by a machine (eg, a computer).
[0022] "Processor" in this document shall refer to a device capable of executing instructions that encode arithmetic, logic, or I/O operations. In an illustrative example, a processor may follow Von Neumann's architectural model and may include an arithmetic logic unit (ALU), a control unit, and a plurality of registers. In a further aspect, a processor may include one or more processor cores, therefore, it may be a single-core processor that typically has the capability to process a single instruction pipe, or a multi-core processor that can process, simultaneously, multiple instruction pipes. In another aspect, a processor may be deployed as a single integrated circuit, two or more integrated circuits, or it may be a component of a multi-chip module (e.g., where data from individual microprocessors is included in a single circuit package). integrated, therefore share a single socket).
[0023] Figure 1 depicts a block diagram of an example of an exemplary computer system, in accordance with one or more aspects of the present disclosure. A computer system 100 may include a processor 102 for employing execution units that include logic to perform algorithms for processing data, in accordance with the embodiment described herein. System 100 represents processing systems based on PENTIUM III™, PENTIUM 4™, Xeon™ and/or Itanium processors available from Intel Corporation of Santa Clara, California, although other systems (including PCs that have other microprocessors, workstations engineering, signal decoders) can also be used. In one embodiment, sample system 100 runs a version of the WINDOWS™ operating system available from Microsoft Corporation of Redmond, Washington, although other operating systems (UNIX and Linux, for example), embedded software, and/or graphical user interfaces also can be used. Thus, the modalities described herein are not limited to any specific combination of hardware and software circuitry.
[0024] The modalities are not limited to computer systems. Alternative embodiments of the systems and methods described herein may be used in other devices, such as handheld devices and embedded applications. Some examples of handheld devices include cell phones, Internet Protocol devices, digital cameras, personal digital assistants (PDAs), and handheld PCs. Embedded applications may include a microcontroller, a digital signal processor (DSP), system on a chip, network computers (NetPC), set top boxes, network hubs, wide area network (WAN) switches, or any other system that can carry out one or more instructions in accordance with at least one modality.
[0025] In this illustrated embodiment, the processor 102 includes one or more execution units 108 for implementing an algorithm that must perform at least one instruction. One embodiment may be described in the context of a single processor desktop or server system, but alternative embodiments may be included in a multiprocessor system. System 100 is an example of a "hub" system architecture. The computer system 100 includes a processor 102 for processing data signals. Processor 102, as an illustrative example, includes a complex instruction set computer (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a which implements a combination of instruction sets, or any other processor device, such as a digital signal processor, for example. Processor 102 is coupled to a processor bus 110 that transmits data signals between processor 102 and other components in system 100. Elements of system 100 (e.g., graphics accelerator 112, memory controller hub 116, memory 120, I/O controller hub 130, wireless transceiver 126, Flash BIOS 128, network controller 134, audio controller 136, serial expansion port 138, I/O controller 140, etc.) perform their conventional functions which are well known to persons familiar with the art.
[0026] In one embodiment, the processor 102 includes an internal level 1 (L1) cache 104. Depending on the architecture, the processor 102 may have a single internal cache or multiple levels of internal caches. Other modalities include a combination of both internal and external caches depending on the deployment and particular needs. Register file 106 must store different types of data in various registers including interior number registers, floating point registers, vector registers, bank registers, shadow registers, checkpoint registers, status registers and pointer registers. instruction.
[0027] Execution unit 108, which includes logic for performing integer and floating point operations, also resides in processor 102. Processor 102, in one embodiment, includes a microcode ROM (ucode) for storing microcode that, when executed, it must realize algorithms for certain microinstructions or handle complex ones. In the present context, the microcode is potentially upgradable to handle logic errors/fixes for the processor 102. For one embodiment, the execution unit 108 includes logic for handling a packed instruction set 109. By including the packed instruction set 109 in the instruction set of a general purpose processor 102, together with associated circuitry to execute the instructions, operations used by many multimedia applications can be performed using compressed data on a general purpose processor 102. In this way , many multimedia applications are accelerated and run efficiently by using a processor data bus to perform operations on compressed data. This potentially eliminates the need to transfer smaller units of data across the processor's data bus to perform one or more operations, one data element at a time. Alternative embodiments of an execution unit 108 may also be used in microcontrollers, integrated processors, graphics devices, DSPs and other types of logic circuitry.
[0028] In certain implementations, processor 102 may additionally include a stack limit low register 421, a stack limit high register 423, and return address check logic 150. In an illustrative example, processor 102 can include a pair of stack boundary registers for every two or more modes of operation, for example, 32-bit user mode, 64-bit user mode, and supervisor mode. The operation of the return address verification logic 150 is described in detail below in this document.
[0029] The system 100 includes a memory 120. The memory 120 includes a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory device, or other memory device. Memory 120 stores instructions 121 and/or data 123 represented by data signals that are to be executed by processor 102. In certain implementations, instructions 121 may include instructions that employ return address check logic 150 to detect an attempt to violation of stack limits, as described in more detail below in this document.
[0030] A system logic chip 116 is coupled to processor bus 110 and memory 120. System logic chip 116 in the illustrated embodiment is a memory controller hub (MCH). Processor 102 may communicate with MCH 116 via a processor bus 110. MCH 116 provides a high-bandwidth memory path 118 to memory 120 for storing instructions and data, and for storing commands, data. and graphic textures. The MCH 116 must route data signals between the processor 102, the memory 120 and other components in the system 100 and to point-link the data signals between the processor bus 110, the memory 120 and the I/O system 122. In some embodiments, system logic chip 116 may provide a graphics port for coupling to a graphics controller 112. MCH 116 is coupled to memory 120 via memory interface 118. Graphics card 112 is coupled to MCH 116 via an Accelerated Graphics Port (AGP) 114 interconnect.
[0031] System 100 uses a proprietary hub interface bus 122 to couple the MCH 116 to the I/O controller hub (ICH) 130. The ICH 130 provides direct connections to some I/O devices via a local I/O bus. The local I/O bus is a high-speed I/O bus for connecting peripherals to memory 120, a chipset and processor 102. Some examples are audio controller, firmware hub (flash BIOS) 128 , wireless transceiver 126, data storage 124, legacy I/O controller that contains user and keyboard input interfaces, a serial expansion port such as Universal Serial Bus (USB), and a network controller 134 The data storage device 124 may comprise a hard disk drive, a floppy disk drive, a CD-ROM device, a flash memory device or other mass storage device. Data storage device 124 may store executable instructions for execution by processor 102. In certain implementations, instructions 121 may include instructions that employ return address check logic 150 to detect an attempted stack boundary violation, as described in more detail below in this document.
[0032] For another modality of a system, an instruction conforming to a modality can be used with a system on a chip. One embodiment of a system on a chip comprises a processor and a memory. The memory for such a system is flash memory. Flash memory can be located on the same data as the processor and other system components. Additionally, other logic blocks, such as a memory controller or a graphics controller, can also be located on a system-on-a-chip.
[0033] Figure 2 is a microarchitecture block diagram for a processor 200 that includes logic circuits for carrying out instructions in accordance with one or more aspects of the present disclosure. In some embodiments, an instruction conforming to an embodiment may be implemented to operate on data elements that have sizes of byte, word, double word, quadruple word, etc., as well as data types such as single integers. and double-precision and floating-point data types. In one embodiment, the working front end 201 is the part of the processor 200 that fetches instructions to be executed and prepares them for later use in the processor pipeline. The front end 201 can include multiple drives. In one embodiment, the prefetcher 226 fetches instructions from memory and feeds them to an instruction decoder 228 which, in turn, decodes or interprets them. For example, in one embodiment, the decoder decodes a received instruction into one or more operations called "microinstructions" or "micro-operations" (also called uops) that the machine can perform. In other embodiments, the decoder parses the instruction into an opcode and corresponding data and controls which fields are. used by microarchitecture to perform modality-compliant operations. In one embodiment, trace cache 230 obtains decoded uops and assembles them into program-ordered sequences or traces in uop queue 234 for execution. When trace cache 230 encounters a complex instruction, microcode ROM 2 32 provides the necessary uops to complete the operation.
[0034] Some instructions are converted into a single micro-op, while others need multiple micro-ops to complete the entire operation. In one embodiment, if more than four micro-ops are required to complete an instruction, decoder 228 accesses microcode ROM 232 to carry out the instruction. For one embodiment, an instruction may be encoded in a few micro-ops for processing in the instruction decoder 228. In another embodiment, an instruction may be stored within microcode ROM 232 if several micro-ops are required to accomplish the operation. Trace cache 230 indicates an array of programmable entry point (PLA) logic to determine a correct microinstruction pointer to read microcode sequences to complete one or more more instructions conforming to an embodiment of microcode ROM 232. After the microcode ROM 232 finishes sequencing the micro-ops for an instruction, the front end 201 of the machine continues to fetch micro-ops from the trace cache 230.
[0035] It is in the downtime execution engine 203 that instructions are prepared for execution. The out of order execution logic has several temporary stores to troubleshoot and reorder the flow of instructions to optimize performance as they go down the pipeline and are scheduled to run. Allocator logic allocates the temporary stores and machine resources that each uop needs to run. Log aliasing logic maps logical records to entries in a log file. The allocator also allocates an entry for each uop in one of two uop queues, one for memory operations and one for out of memory operations, ahead of instruction schedulers: memory schedulers, fast scheduler 202, slow floating point scheduler /general 204, and simple floating point scheduler 206. uop schedulers 2 02, 2 04, 206 determine when a uop is ready to run based on the readiness of its dependent input register operand sources and the availability of execution resources that uops need to complete their operations. The fast scheduler 202 of one embodiment may schedule every half of the main clock cycle whereas the other schedulers may schedule one or more every main processor clock cycle. Schedulers arbitrate for dispatch ports to schedule uops to run.
[0036] Physical log files 208, 210 are located between schedulers 202, 204, 206 and execution units 212, 214, 216, 218, 220, 222, 224 in execution block 211. separate register 208, 210 for integer and floating point operations, respectively. Each log file 208, 210 of an embodiment also includes a bypass network that can bypass or forward complete results that have not yet been written to the log file to new dependent uops. Integer register file 208 and floating point register file 210 are also capable of communicating data with each other. For one embodiment, the integer register file 208 is divided into two separate register files, one register file for the low-order 32 bits of data and a second register file for the high-order 32 bits of data. The floating point register file 210 of one embodiment has entries that are 128 bits wide, due to the fact that floating point instructions typically have operands of 64 to 128 bits in width.
[0037] Execution block 211 contains execution units 212, 214, 216, 218, 220, 222, 224, in which the instructions are actually executed. This section includes register files 208, 210, which store the integer and floating point data operand values that the microinstructions need to execute. One-mode processor 200 comprises several execution units: address generation unit (AGU) 212, AGU 214, fast ALU 216, fast ALU 218, legend ALU 220, floating point ALU 222, floating point motion unit 224 For one embodiment, floating point execution blocks 222, 224 perform floating point, MMX, SIMD and SSE, or other operations. Floating-point ALU 222 of one embodiment includes a 64-bit by 64-bit floating-point divider to perform division, square root, and remaining micro-ops. For the systems and methods described in this document, instructions involving a floating-point value can be handled with floating-point hardware. In one embodiment, ALU operations proceed to high-speed ALU execution units 216, 218. Fast ALUs 216, 218, of one embodiment can perform fast operations with an effective latency of one-half of a clock cycle. For one embodiment, the more complex integer operations proceed to the slow ALU 220, since the slow ALU 220 includes integer execution hardware for type of long latency operations such as, multiplier, swaps, flag logic and branch processing. Memory load/store operations are performed by AGUs 212, 214. For one embodiment, integer ALUs 216, 218, 220 are described in the context of performing integer operations on 64-bit data operators. In alternative embodiments, ALUs 216, 218, 220 may be deployed to hold various data bits including 16, 32, 128, 256, etc. Similarly, floating point units 222, 224 can be deployed to support a range of operands that have bits of various widths. For one embodiment, floating point units 222, 224 may operate on 128-bit wide compressed data operands in combination with SIMD and multimedia instructions.
[0038] In one embodiment, uop schedulers 202, 204, 206 dispatch dependent operations before the parent load finishes executing. As uops are scheduled and speculatively executed on processor 200, processor 200 also includes logic to handle memory leaks. If a data load is lost in the data cache, there may be dependent operations during pipe operation that temporarily leave the scheduler with incorrect data. A replay engine tracks and re-executes instructions that use incorrect data. Dependent operations must be reproduced and independent operations are allowed to complete. The schedulers and playback engine of one modality of a processor are also designed to capture sequences of instructions for text string comparison operations.
[0039] The term "registers" may refer to the locations of built-in storage that are used as part of instructions to identify operands. In other words, registers can be those that are usable outside the processor (from a programmer's perspective). However, the registers of a modality should not be limited in meaning to a particular type of circuit. Preferably, a record of an embodiment has the ability to store and provide data and to perform the functions described herein. The registers described in this document can be deployed by a set of circuits within a processor using any number of different techniques, such as dedicated physical registers, dynamically allocated physical registers using register aliasing, combinations of registers dynamically allocated and dedicated physical, etc. In one embodiment, the interior number registers store thirty-two-bit integer data. A modality record file also contains eight multimedia SIMD records for compressed data. For the discussions below, registers are understood to be data registers designed to contain compressed data, such as 64-bit wide MMX registers (also called "mm" registers in some examples) on microprocessors enabled with MMX™ technology from Intel Corporation of Santa Clara, California. These MMX registers, available in both integer and floating-point forms, can operate with compressed data elements that accompany SIMD and SSE instructions. Similarly, 128-bit wide XMM registers relative to SSE2, SSE3, SSE4, or some more advanced technology (generally referred to as "SSEx") can also be used to contain such data operands. compressed. In one embodiment, when storing compressed data and integer data, records do not need to differentiate between the two data types. In one embodiment, integer and floating point are contained either in the same log file or in different log files. Additionally, in one embodiment, floating point and integer data can be stored in different registers or in the same registers.
[0040] Figures 3a to 3b schematically illustrate elements of a processor microarchitecture in accordance with one or more aspects of the present disclosure. In Figure 3a, a processor pipeline 400 includes a seek stage 402, a decode stage of length 404, a decode stage 406, an allocation stage 408, a remainder stage 410, a scheduling stage (also known as a dispatch stage). or issue) 412, a register read/memory read stage 414, an execution stage 416, a backside write/memory write stage 418, an exception handling stage 422, and a commit stage 424.
[0041] In Figure 3b, the arrows demonstrate a coupling between two or more units and the direction of the arrow indicates a direction of data flow between these units. Figure 3b shows a processor core 490 that includes a front end unit 430 coupled to an execution engine unit 450, both of which are coupled to a memory unit 470.
[0042] Core 490 can be a reduced instruction set computing (RISC) core, a complex instruction set computing (CISC) core, a very long instruction word (VLIW) core, or a type of core. hybrid or alterative. As yet another option, the core 490 may be a special purpose core, such as, for example, a network or communication core, compression engine, graphics core, or the like.
[0043] The front end unit 430 includes a branch prediction unit 432 coupled to an instruction cache unit 434, which is coupled to a temporary storage to the instruction translation portion (TLB) 436, which is coupled to to an instruction fetch unit 438, which is coupled to a decoding unit 440. The decoding unit or decoder may decode instructions and generate, as an output, one or more micro-operations, microcode insertion points, micro-instructions, other instructions, or other control signals, that are decoded or are derived from the original instructions or that otherwise reflect the same. The decoder can be deployed using several different mechanisms. Examples of suitable mechanisms include, but are not limited to, lookup tables, hardware implementations, programmable logic arrays (PLAs), microcode read-only memories (ROMs), etc. Instruction cache unit 434 is additionally coupled to a level 2 (L2) cache unit 476 in memory unit 470. Decoding unit 440 is coupled to an allocator/rename unit 452 in execution engine unit 450 .
[0044] The execution engine unit 450 includes the allocator/rename unit 452 coupled with a low unit 454 and a set of one or more scheduler unit(s) 456. The unit(s) )scheduler(s) 456 represent(s) any number of different schedulers, including reserve stations, central instruction window, etc. The scheduler unit(s) 456 is/are coupled to the physical file record unit(s) 458. Each of the record file unit(s) physical(s) 458represents one or more physical log files, files other than these files store one or more different data types, such as scalar integer, scalar floating point, compressed integer, compressed floating point, vector integer, point vector floating, etc., status (eg, an instruction pointer which is the address of the next instruction to be executed), etc. The physical file record unit(s) 458 is/are overlaid by the lower unit 454 to illustrate various ways in which record aliasing and running out of order can be deployed (e.g. example, using reorder temp store(s) and write-off log file(s), using future file(s), history temp store(s), and file(s) s) of write-off; with the use of register maps and a group of registers; etc.) . In general, architectural registers are visible from outside the processor or from a programmer's perspective. Records are not limited to any particular circuit type. Various types of records are suitable as long as they have the capacity to store and provide data as described in this document. Examples of suitable registers include, but are not limited to, dedicated physical registers, dynamically allocated physical registers that use register aliasing, combinations of dynamically allocated and dedicated physical registers, etc. Download unit 454 and physical file record unit(s) 458 are coupled to execution cluster(s) 460. 0(s) execution cluster(s) 460 includes/ include a set of one or more execution units 162 and a set of one or more memory access units 464. Execution units 462 can perform various operations (e.g. swaps, addition, subtraction, multiplication) and in various types data (e.g. scalar floating point, compressed integer, compressed floating point, vector integer, vector floating point). While some modalities may include multiple execution units dedicated to specific functions or sets of functions, other modalities may include one execution unit or multiple execution units, all of which perform all functions. The scheduler unit(s) 456, the physical file record unit(s) 458 and execution cluster(s) 460 are shown to be multiple, possibly due to the fact that that certain modalities create separate pipes for certain types of data/operations (e.g. scalar integer pipe, scalar floating point pipe/compressed integer/compressed floating point/vector integer/vector floating point pipe and/ or a memory access pipe that each has its own scheduler unit, physical record file(s) unit and/or execution cluster - and in the case of a separate memory access pipe, certain modalities are implemented in which the running cluster of that pipe has the memory access unit(s) 464). It should also be understood that when separate pipelines are used, one or more of these pipelines may be a non-functioning emission/run and the rest may be in service.
[0045] The memory access unit 464 is coupled to the memory unit 470, which includes a data TLB unit 472 coupled to a data cache unit 474 coupled to a level 2 (L2) cache unit ) 476. In an exemplary embodiment, memory access units 464 may include a load unit, an address storage unit, and a data storage unit, each of which is coupled to the data TLB unit 472 in the memory unit 470. L2 cache unit 476 is coupled to one or more other cache levels and ultimately to main memory.
[0046] By way of example, the exemplary register aliasing, out-of-function issue/execution core architecture can deploy the pipeline 400 according to the following: instruction fetch 438 performs fetch and decode stages of length 402 and 404; decoding unit 440 performs decoding stage 406; the allocator/rename unit 452 performs the allocation stage 408 and the remaining stage 410; the scheduling unit(s) 456 performs the scheduling stage 412; the physical file register unit(s) 458 and the memory unit 470 perform the register read/memory read stage 414; execution cluster 460 performs execution stage 416; the memory unit 470 and the physical file record unit(s) 458 perform the reverse-side recording/recording stage in the memory 418; multiple units may be involved in the 422 exception handling stage; and the download unit 454 and the physical file record unit(s) 458 perform the commit stage 424.
[0047] Core 49 0 may support one or more instruction sets (e.g. the x86 instruction set (with some extensions that have been added with more updated versions); the MIPS instruction set, from MIPS Technologies of Sunnyvale , CA; the ARM instruction set (with additional extensions such as NEON) from ARM Holdings of Sunnyvale, CA).
[0048] In certain deployments, the core may support multi-threading (which performs two or more parallel sets of chain operations) and can accomplish this in a variety of ways including time-sliced multi-threading, concurrent multi-threading (where a single physical core provides a logical core for each of the threads that the physical core is multithreading simultaneously), or a combination thereof (e.g. time-slice search and decoding and later simultaneous multi-threading, such as in Intel® Hyper-threading technology).
[0049] Although the illustrated embodiment of the processor also includes separate instruction and data cache units 434/474 and a shared L2 cache unit 476, alternative embodiments may also have a single internal cache for both instructions and data, for example, a level 1 (Ll) internal cache, or multiple levels of internal cache. In some embodiments, the system may include a combination of an internal cache and an external cache that is external to the core and/or processor. Alternatively, the entire cache can be external to the core and/or processor.
[0050] Figure 4 illustrates a block diagram of an exemplary processor 102 of computer system 100, in accordance with one or more aspects of the present disclosure. Referring to Figure 4, processor core 490 may include a fetch unit 202 to fetch instructions for execution by core 490. Instructions may be fetched from one or more storage positives, such as memory 115. processor core 490 may additionally include a decoding unit 440 for decoding a fetched instruction into one or more micro-operations (pops). Processor core 490 may additionally include a storage unit 446 for storing a decoded instruction received from decoding unit 440 until the instruction is ready to be issued, for example, until operand values for the decoded instruction become available. Storage unit 446 can schedule and/or issue decoded instructions to an execution unit 450.
[0051] Execution unit 450 may include one or more arithmetic and logic units (ALUs), one or more integer execution units, one or more floating point execution units, and/or other execution units. In certain deployments, the 450 execution unit may execute non-working instructions (000). Processor core 490 may additionally include a low unit 454 to issue executed instructions after they are acknowledged.
[0052] In certain implementations, processor 102 may additionally comprise return address verification logic 150 designed to verify procedure return addresses to prevent unauthorized stacking and pivoting. Return address verification logic 150 may include a return address buffer store pointer 421 configured to reference an element of a return address store 423. Although in Figure 4, return address store 423, the pointer return address buffer 421 and logic 150 are shown to be located within a core 490, at least some of the elements verified above may be provided at some other location in the computer system 100. For example, the return address storage return 423 may reside, partially, within a memory that is external to the processor 102. Additionally, return address store 423, return address buffer pointer 421, logic 150 and/or some of their respective components may be shared among a plurality of processor cores.
[0053] Numerous programming languages support a notion of procedure, which is a unit of code that has an entry point and at least one return instruction. A procedure can be started by a calling statement executed within another procedure. A return instruction can cause a processor to flow execution back to the calling procedure (for example, to the instruction that follows the corresponding calling instruction within the calling procedure). In certain processor architectures, the return address and/or parameters that are passed to a procedure may be stored on a stack, the latter of which refers to a data structure within a computer's memory system. A stack can be represented by a linear array that supports the "last in - first out" (LIFO) access paradigm, as schematically illustrated by Figure 5.
[0054] In the illustrative example of Figure 5, the stack grows towards fewer memory addresses. Data items can be pushed onto the stack using the PUSH instruction and retrieved from the stack using the POP instruction. In order to push a data item onto the stack, the processor may modify (for example, decrease) the value of a stack pointer, then copy the data items into the memory location referenced by the stack pointer. Therefore, the stack pointer always references the topmost element of the stack. In order to retrieve a data item from the stack, processor 102 may read the data item referenced by the stack pointer, then modify (e.g., increment) the value of the stack pointer so that it references the element that was placed on the stack immediately before the element being retrieved. On certain processor architectures, the stack pointer may be stored in a dedicated processor register, called the SP or ESP.
[0055] Processor 102 may employ several segment registers to support a memory segmentation mechanism. In certain implementations, processor 102 may additionally support memory segment typing in order to restrict memory access operations that can be performed on a particular type of segment. Segment typing can be supported by associating memory types with segment registers. In one example, processor 102 may include at least one code segment register (which may also be called CS), two or more data segment registers (which may also be called DS, ES, FS and GS), and at least one stack segment record (which may also be called SS).
[0056], During the execution of a calling instruction, processor 102 may, before branching to the first instruction of the called procedure, push the address stored in the instruction pointer register (EIP) onto the current stack. This address, also called the return instruction pointer, points to the instruction where execution of the calling procedure is to continue following a return from the called procedure. During execution of a return instruction within the called procedure, processor 102 may retrieve the return instruction pointer from the stack back into the EIP register and thereby continue execution of the calling procedure.
[0057] It should be noted that in certain implementations, the processor 102 does not require the return instruction pointer to point back to the calling procedure. Before executing the return instruction, the return instruction pointer stored on the stack can be manipulated by software (for example, executing a PUSH instruction) to point to an arbitrary address.
[0058] In order to prevent a potential attacker from taking advantage of this behavior to divert the flow of execution to an arbitrary memory location, processor 102 may, in response to receipt of a calling instruction, store the non-return instruction pointer. only on the stack, but also in the return address store 423. In response to receiving a return instruction within the called procedure, the processor 102 may retrieve and compare the return instruction pointers from the stack and from the storage return address. If the two addresses are compatible, the processor can continue executing the return instruction; otherwise, the processor may throw an exception.
[0059] In certain implementations, the return address store 423 may comprise a first temporary storage 423-1 stored within the internal memory 602 of the processor 102 and a second temporary storage 423-2 stored within the external memory 604 which may be communicatively coupled to the processor 102 via a system bus 608, as schematically illustrated in Figure 6. The second temporary storage 423-2 residing in external memory 604 may be used as an "overflow" temporary storage if the size of the primary temporary storage 4231 becomes insufficient to store the return addresses.
[0060] In order to implement the "overflow" functionality, in an illustrative example, the return address temporary store pointer 421 can be initialized to point to the base of the internal return address store 423-1 and can be modified (for example, decremented) when a new return address is placed in return address store 423. When the limit of internal temporary storage 423-1 is reached, the next pointer modification operation may cause the return address buffer pointer 421 points to the base of external return address storage 423-2. Similar functionality can be implemented to switch the return address buffer pointer 421 from the external buffer 423-2 to the internal buffer 423-1 in response to removing a return address from buffer storage 423.
[0061] In certain deployments, at least part of the external return address store 423-2 may be stored in one or more caches 104 of processor 102, such as L2 cache and/or L1 cache. In an illustrative example, a plurality of cache entries in the processor's lowest-level data cache 102 may be reserved to cache a plurality of entries in the external return address store 423-2.
[0062] In another aspect, the memory hosting the 423-2 external memory buffer can be configured to allow access only by privileged code, employing the processor memory protection mechanism that provides various levels of access privileges. . In one example, access privilege levels, also called rings of protection, can be numbered from 0 to 3, and higher numbers can mean lesser privileges. Protection ring 0 can be reserved for segments that contain the most privileged code, data, and stacks, such as that of an operating system kernel. External protection rings can be used for application programs. In certain deployments, operating systems may use a subset of the plurality of protection rings, for example, ring 0 for the operating system kernel, and ring 3 for applications. The processor can use privilege levels to prevent a processor operating at a less privileged level from accessing a thread with a higher privilege. The current privilege level (CPL) is the privilege level of the currently running process. The CPL can be stored in bits 0 and 1 of the CS and SS segment registers. The CPL can be equal to the privilege level of the code segment whose instructions are being fetched. The processor may change the CPL when program control is transferred to a code segment with a different privilege level. The processor can perform a privilege level check by comparing the CPL to the privilege level of a segment or a calling port it accessed (privilege descriptor level, DPL) and/or the requested privilege level (RPL) assigned to a segment selector that is accessed. When a processor detects a privilege level violation, it can generate a general protection exception.
[0063] Thus, in an illustrative example, the memory hosting the external memory buffer 423-2 can be configured to allow access only by privileged code, such as the operating system kernel that has a current privilege level (CPL) of 0. Alternatively, the memory hosting external memory buffer 423-2 can be configured to only allow read-only access, so it can only be modified by return address check logic 150.
[0064] As noted in the present document above, in response to receiving a call instruction, processor 102 may, before branching to the first instruction of the called procedure, push the address stored in the instruction pointer register (EIP ) on the current stack and in the return address store 423. During the execution of a return instruction within the called procedure, the processor 102 can retrieve and compare the return instruction pointers from the stack and from the address store return. If the two addresses are compatible, the processor can continue executing the return instruction; otherwise, the processor may throw an exception.
[0065] In an additional aspect, depending on the processor architecture and/or operating system, the call stack can be legitimately modified by means other than a call instruction. For example, the C standard library's setjump/longjump functions provide the ability to restore a program state, including the instruction pointer, even through multiple levels of procedure calls. In another example, a return instruction can be used to transfer control to a dynamically computed entry point, for example by pushing the dynamically computed address onto the stack. In order to properly handle these and other situations when the call stack is legitimately modified by means other than a call instruction, processor 102 may provide an alternative mechanism (i.e., different from executing a calling instruction) for modify the contents of the return address store and/or return address temporary store pointer.
[0066] Thus, in certain implementations, the processor 102 may have an instruction set that includes a return address buffer store pointer modification instruction and/or a return address storage buffer modification instruction. In response to receiving a return address buffer pointer modification instruction, processor 102 may modify (e.g., increment or decrement) the return address buffer pointer without modifying the contents of the return address buffer. return. In response to receiving a return address buffer pointer modification instruction, processor 102 may modify the contents of return address storage (e.g., store a new return address in temporary storage or remove a return address). return buffer) and can also modify accordingly (for example, increment or decrement) the return address buffer pointer. In certain deployments, the return address temporary storage pointer modification instruction and/or the return address storage modification instruction may be privileged instructions, e.g. executable only by a process or ring 0 chain, preventing , thus unauthorized return address store modification by a process or user chain.
[0067] Figure 7 depicts a flow diagram of an exemplary method for procedure return address verification, in accordance with one or more aspects of the present disclosure. Method 700 may be performed by a computer system comprising hardware (e.g., circuitry, dedicated logic, and/or programmable logic), software (e.g., instructions executable on a computer system to perform hardware simulation), or a combination thereof. Method 700 and/or each of two functions, routines, subroutines, or operations may be performed by one or more physical processors of the computer system executing the method. Two or more functions, routines, subroutines, or method 700 operations may be performed in parallel or in an order that may differ from the order described above. In one example, as illustrated by Figure 7, method 700 may be performed by computer system 100 of Figure 1.
[0068] Referring to Figure 7, at block 710, the computer system processor 100 can modify a stack pointer. In one example, the processor may modify the stack pointer by placing a return address on the stack in response to receiving a call instruction, as described in more detail above in this document. Alternatively, the processor may directly modify the stack, for example by pushing a dynamically computed return address onto the stack, as described in more detail above in this document.
[0069] In block 720, the processor may modify a return address temporary storage pointer. In one example, the processor may modify the return address buffer pointer by placing a return address in the return address store in response to receiving a call instruction, as described in more detail above in this document. Alternatively, the processor may directly modify the return address store to reflect a direct stack modification, for example when a dynamically computed return address has been pushed onto the stack, as described in more detail above in this document.
[0070] In block 730, the processor may optionally execute one or more instructions, for example, of the procedure that was called by the calling instruction that caused modifications to the stack and return address temporary storage pointer indicated by blocks 710 to 720 .
[0071] In block 740, the processor may receive a return instruction (RET).
[0072] In block 750, in response to establishing that the return address indicated by the stack pointer is equal to the return address indicated by the return address temporary storage pointer, the processor may execute the RET instruction, indicated schematically at block 760, the stack return instruction pointer is stored in the EIP register in order to continue execution of the calling procedure.
[0073] In block 750, in response to establishing that the return address indicated by the stack pointer is not compatible with the return address indicated by the return address temporary storage pointer, the processor may generate a Fail to Run exception. Battery.
[0074] The methods and systems described above in this document can be deployed by a computer system of various architectures, designs and configurations. Laptop computers, desktop switches, handheld PCs, personal digital assistants, engineering workstations , servers, network devices, network hubs, switches, embedded processors, digital signal processors (DSPs), graphics devices, electronic game devices, set-top boxes, microcontrollers, cell phones, portable media players, handheld devices, and various electronic devices are also suitable for deploying the methods described in this document. In general, a wide variety of systems or electronic devices capable of incorporating a processor and/or other logical execution as disclosed herein are generally suitable for implementing the systems and methods described herein.
[0075] Figure 8 depicts a block diagram of an exemplary computer system, in accordance with one or more aspects of the present disclosure. As shown in Figure 8, the multiprocessor system 800 is a point-to-point interconnect system and includes a first processor 870 and a second processor 880 coupled via a point-to-point interconnect 850. Each of the processors 870 and 880 may be some version of processor 102 capable of performing return address verification, as described in more detail above herein. Although shown with only two processors 870, 880, it should be understood that the scope of the present disclosure is not limited thereto. In other embodiments, one or more additional processors may be present in the exemplary computer system.
[0076] The 870 and 880 processors are shown including the 872 and 882 onboard memory controller units, respectively. The 870 processor also includes, as part of its bus controller units, the 876 and 87 8 point-to-point (P-P) interface; similarly, a second processor 880 includes PP interfaces 886 and 888. Processors 87 0, 880 can exchange information through a point-to-point (PP) interface 750 using PP interface circuits 878, 888. As shown in Figure 7, the IMCs 872 and 882 couple the processors to respective memories, namely, a memory 832 and a memory 834, which may be portions of main memory locally attached to the respective processors.
[0077] Processors 870, 880 can each exchange information with an 890 chip set via individual PP interfaces 852, 854 using point-to-point interface circuits 876, 894, 886, 898. The chipset 890 can also exchange information with a high-performance graphics circuit 838 via a high-performance graphics interface 839.
[0078] A shared cache (not shown) can be included both on the processor and off of both processors, still connected to the processors via PP interconnect, so that the local cache information from one or both processors can be stored in the shared cache if a processor is placed in a low power mode.
[0079] Chipset 890 may be coupled to a first bus 816 via an interface 896. In one embodiment, the first bus 816 may be a Peripheral Component Interconnect (PCI) bus, or a bus such as a PCI Express bus or other third-generation I/O interconnect bus, although the scope of the present disclosure is not limited thereto.
[0080] As shown in Figure 8, multiple I/O devices 814 can be coupled to a first bus 816, along with a bus bridge 818 that couples the first bus 816 to a second bus 820. In one embodiment, the second bus 820 can be a low pin count (LPC) bus. Various devices may be coupled to the second bus 820 including, for example, a keyboard and/or mouse 822, communication devices 827, and a storage unit 828 such as a disk drive or other mass storage device that may include instructions. /code and data 830, in one mode. In addition, audio I/O 824 can be coupled to the second bus 820.
[0081] Figure 9 depicts a block diagram of an exemplary system on a chip (SoC) in accordance with one or more aspects of the present disclosure. Application processor 910 may be capable of performing return address verification, as described in more detail above herein. As schematically illustrated by Figure 9, the interconnect unit(s) 902 may be coupled to: an application processor 910 that includes a set of one or more cores 902A-N and unit(s) ) shared cache 906; a system agent unit 910; bus controller unit(s) 916; integrated memory controller unit(s) 914; a set of one or more media processors 920 which may include integrated graphics logic 908, an image processor 924 to provide still camera and/or video functionality, an audio processor 926 to provide hardware audio acceleration, and a video processor 928 to provide video encoding/decoding acceleration; a static random access memory unit (SRAM) 930; a direct memory access unit (DMA) 932; and a display unit 940 for coupling to one or more external displays.
[0082] Figure 10 depicts a block diagram of an exemplary computer system in accordance with one or more aspects of the present disclosure. Processor 1610 may be provided by some version of processor 102 capable of performing return address verification, as described in more detail above herein.
[0083] The 1600 system illustrated schematically by Figure 10 may include any combination of deployed components such as ICs, portions thereof, distinct electronic devices, or other modules, logic, hardware, software, firmware, or a combination thereof adapted into a system of computer, or as components incorporated differently within a computer system chassis. The block diagram in Figure 10 is intended to show a high-level view of many components of the computer system. However, it should be understood that some of the components shown may be omitted, additional components may be present, and a different arrangement of the components shown may occur in other deployments.
[0084] Processor 1610 may be provided by a microprocessor, a multi-core processor, a multithreaded processor, an ultra-low voltage processor, an embedded processor, or other known processing element. In the illustrated deployment, the 1610 processor acts as a main processing unit and as a central hub for communication with many of the various components of the 1600 system. For example, the 1600 processor can be deployed as a system on a chip (SoC). As a specific illustrative example, the 1610 processor includes an Intel® Architecture Core™ based processor such as an i3, i5, i7 or other similar processor available from Intel Corporation, Santa Clara, CA.
[0085] The 1610 processor can communicate with a 1615 system memory. In various deployments, individual memory devices can be of different packet types, e.g. single data packet (SDP), dual data packet (DDP) ) or quadruple data packet (1P). These devices, in some deployments, can be soldered directly to a motherboard to provide a lower-profile solution, while in other deployments, the devices can be configured as one or more memory modules that in turn dock to the motherboard via a certain connector. Other memory deployments are possible, for example other types of memory modules, for example dual in-line memory modules (DIMMs) of different varieties including but not limited to microDIMMs, MiniDIMMs. In an illustrative example, the memory can be sized between 2GB and 16GB, and can be configured as a DDR3LM package or an LPDDR2 or LPDDR3 memory that is soldered onto a motherboard via a Ball Grid Array (BGA).
[0086] In order to provide a persistent store of information such as data, applications, one or more operating systems, and so on, a 1620 mass storage may also be coupled to the 1610 processor. In certain deployments, in order to To enable thinner and lighter system design as well as improve system responsiveness, the 1620 mass storage can be deployed via an SSD. In other deployments, mass storage may be primarily provided by a hard disk drive (HDD) with a smaller amount of SSD storage to act as an SSD cache to allow for non-volatile storage of context state and other information. similar during power off event so that a quick power on can occur on restart of system activities.
[0087] Also shown in Figure 10, a flash device 1622 may be coupled to a processor 1610, for example, via a serial peripheral interface (SPI). The 1622 flash device can provide non-volatile system storage of system software, including basic input/output software (BIOS) as well as other system firmware.
[0088] In many deployments, system mass storage may be provided by an SSD alone or as a disk, optical, or other drive with an SSD cache. In some deployments, mass storage can be provided by an SSD or as an HDD along with a Restoration Cache Module (RST). The SSD cache can be configured as a single-level cache (SLC) option or as a multi-level cache (MLC) to provide an appropriate level of responsiveness.
[0089] Various input/output (I/O) devices may be present within a 1600 system, including, for example, a 1624 display which may be provided by a high definition LCD or LED panel configured within a cover portion of the chassis. This display panel may also provide a 1625 touchscreen externally adapted through the display panel so that through user interaction with that touchscreen, user input can be provided to the system in order to to enable the desired operations, for example in relation to displaying information, accessing information and so on. In certain implementations, the display 1624 may be coupled to the processor 1610 via a display interconnect which may be deployed as a high performance graphics interconnect. Touchscreen 1625 may be coupled to processor 1610 via another interconnect, which, in one embodiment, may be I2C interconnect. In addition to the 1625 touchscreen, user input via touch can also take place via a 1630 touch pad that can be configured within the chassis and can also be coupled to the same I2C interconnect. 1625 touchscreen.
[0090] Various sensors may be present within the system and may be coupled to the 1610 processor in different ways. Certain inert and environmental sensors can couple to the 1610 processor through a 1640 sensor hub, for example, via an I2C interconnect. These sensors may include an accelerometer 1641, an ambient light sensor (ALS) 1642, a compass 1643, and a gyroscope 1644. Other environmental sensors may include one or more thermal sensors 1646 that, in some embodiments, couple to the 1610 processor for via a system management bus (SMBus). In certain deployments, one or more infrared or other heat-capturing elements, or any other element to capture a user's presence or movement, may be present.
[0091] Various peripheral devices can couple to the 1610 processor through a low pin count (LPC) interconnect. In certain implementations, various components may be coupled via an embedded controller 1635. Such components may include a keyboard 1636 (e.g., coupled via a PS2 interface), a fan 1637, and a thermal sensor 1639. In some embodiments, the touch pad 163 0 can also dock with the EC 163 5 via a PS2 interface. In addition, a security processor, such as a 1638 Trusted Computing Group (TCG) TPM Specification Version 1.2, dated October 2, 2003, Trusted Computing Group (TCG) Platform Module (TPM) may also mate with the 1610 processor by through this LPC interconnect.
[0092] In certain deployments, the peripheral ports may include a High Definition Media Interface (HDMI) connector (which may have different form factors such as actual size, mini or micro); one or more USB ports, such as full-size external ports conforming to the Universal Serial Bus Revision 3.0 Specification (November 2008), at least one of which is powered to charge USB devices (such as smart phones) when the system is in a Connected Standby state and is plugged into an AC wall adapter. In addition, one or more Thunderbolt™ ports may be provided. Other ports may include an externally accessible card reader, such as a full-size SD-XC card and/or a WWAN SIM card reader (e.g., an 8-pin card reader). For audio, a 3.5mm adapter capable of stereo sound and microphone (e.g. combo functionality) may be present, with adapter detection support (e.g. headset only supports using microphone in the cover or headset with cable microphone). In some embodiments, this adapter can be reassignable between a headphone and a stereo microphone input. In addition, a power adapter can be provided for coupling to an AC source.
[0093] The 1600 system can communicate with external devices in a variety of ways, including wirelessly. In the embodiment shown in Figure 16, several wireless modules, each of which may correspond to a radio configured for a particular wireless communication protocol, are present. One way for wireless communication in a short range, e.g. near field, may be by means of a Near Field Communication Unit (NFC) 1645 which can communicate, in one embodiment, with the 1610 processor via an SMBus .
[0094] Additional wireless units may include other short-range wireless mechanisms including a WLAN unit 1650 and a Bluetooth unit 1652. With the use of the WLAN unit 1650, Wi-Fi™ communications comply with a certain standard. 802.11 of the Institute of Electrical and Electronics Engineers (IEEE) can be carried out, although through the Bluetooth unit 1652, short-range communications via a Bluetooth protocol can take place. These units can communicate with the 1610 processor, for example, over a USB link or a universal asynchronous transmitter-receiver (UART) link. Or these units can mate with the 1610 processor via an interconnect in accordance with a Peripheral Component Interconnect Express™ (PCIe™) protocol, for example, compliant with PCI ExpressTM Specification Base Specification version 3.0 (published January 17, 2007) ), or another similar protocol, such as a serial data input/output (SDIO) standard. Of course, the actual physical connection between these peripheral devices, which can be configured on one or more add-in cards, can take place through NGFF connectors adapted to a motherboard.
[0095] In addition, wireless long-distance communications, for example under a cellular or other wireless long-distance protocol, can take place via a 1656 WWAN unit which in turn can couple to a subscriber identity module (SIM) 1657. In addition, in order to receive and use location information, a GPS module 1655 may also be present.
[0096] In order to provide audio inputs and outputs, an audio processor may be deployed via a 1660 digital signal processor (DSP), which may couple to a 1610 processor via a high-quality audio link. definition (HDA) . The controller units, DSP 1660 can communicate with an on-board encoder/decoder (CODEC) and a 1662 amplifier which, in turn, can couple to 1663 output speakers that can be deployed inside the chassis. The controller units, amplifier and 1662 CODEC can be coupled to receive audio inputs from a 1665 microphone.
[0097] Figure 11 depicts a block diagram of an exemplary system on a chip (SoC) in accordance with one or more aspects of the present disclosure. As a specific illustrative example, the SOC 1700 can be included in a User Equipment (UE). In one embodiment, the UE refers to any device to be used by an end user for communication, for example, a handheld phone, smart phone, tablet computer, ultra-thin notebook computer, notebook computer with power adapter. broadband, or any other similar communication device. Often, a UE connects to a base station or node, which potentially corresponds by nature to a mobile station (MS) in a GSM network.
[0098] As schematically illustrated by Figure 11, the SOC 1700 can include two cores. Cores 1706 and 1707 may be coupled to control cache 17 08 which is associated with bus interface unit 1709 and L2 cache 1710 to communicate with other system parts 1700. Interconnect 1710 may include an on-chip interconnect , for example an IOSF, AMBA or other interconnect.
[0099] Interface 1710 can provide communications channels to other components, for example, a Subscriber Identity Module (SIM) 173 0 to interface with a SIM card, a Boot ROM 1735 to hold code for execution by cores 1706 and 1707 to start and initialize the SOC 1700, a 1740 SDRAM controller to interface external memory (e.g. DRAM 1760), a flash controller 1745 to interface non-volatile memory (e.g. flash 1765), a 1550 peripheral control (e.g. Serial Peripheral Interface) for interfacing peripherals, 1720 video codes and 1725 video interface for displaying and receiving input (e.g. touch enabled input), the 1715 GPU for performing graphics-related computations , etc. In addition, the system may comprise peripherals for communication, for example, a Bluetooth module 1770, 3G modem 1775, GPS 1785 and WiFi 1785.
[00100] Other computer system designs and configurations may also be suitable for deploying the systems and methods described in this document. The following examples illustrate various implementations in accordance with one or more aspects of the present disclosure.
[00101] Example 1 is a processing system comprising: a stack pointer configured to reference a first return address stored in a stack; a return address temporary storage pointer configured to reference a second return address stored in a return address store; and a return address check logic configured, in response to receipt of a return instruction, to compare the first return address to the second return address.
[00102] In Example 2, the return address check logic of the processing system of Example 1 can be further configured to execute the return instruction in response to the determination that the first return address is equal to the second return address .
[00103] In Example 3, the return address check logic of the processing system of Example 1 can be further configured to generate a stack fault exception in response to the determination that the first return address differs from the second return address. return.
[00104] In Example 4, the return address check logic of the processing system of Example 1 can be further configured, in response to receiving a call instruction, to store a return address on the stack and in the address store return.
[00105] In Example 5, the return address check logic of the processing system of Example 1 can be further configured, in response to receiving a return address store modification instruction, to do at least one of: store a return address in the return address store or remove a return address from the return address store.
[00106] In Example 6, the return address check logic of the processing system of Example 1 can be further configured, in response to receiving a return address buffer pointer modification instruction, to perform at least within: increment the return address staging pointer or decrement the return address staging pointer.
[00107] In Example 7, the return address store modification instruction from examples 5 through 6 may be a privileged instruction
[00108] In Example 8, the processing system stack of any of Examples 1 to 7 may reside in memory communicatively coupled to the processing system.
[00109] In Example 9, the processing system return address store of any of Examples 1 to 7 may reside, at least partially, in a memory built into the processing system.
[00110] In Example 10, the processing system return address store of any one of Examples 1 to 7 may comprise a first portion residing in memory embedded in the processing system and a second portion residing in external memory .
[00111] In Example 11, the external memory of the processing system of Example 10 can be provided by a read-only memory.
[00112] In Example 12, the second portion of the processing system return address store of any one of Examples 10 to 11 can be configured to operate as an overflow buffer with respect to the first portion.
[00113] Example 13 is a method for checking the return address of a procedure comprising: modifying, by a processing system, a stack pointer; modify a return address temporary storage pointer; receive a return instruction; comparing a first return address indicated by the stack pointer to a second return address temporary address; and executing the return instruction in response to the determination that the first return address is equal to the second return address.
[00114] In Example 14, the method of Example 13, may further comprise raising a stack fault exception in response to the determination that the first return address differs from the second return address.
[00115] In Example 15, the method of Example 13 may additionally comprise: storing a return address on the stack and in the return address store in response to receiving a call instruction.
[00116] In Example 16, the method of Example 13 may further comprise: receiving a return address store modification instruction; and perform at least one of: store a return address in the return address store or remove a return address from the return address store.
[00117] In Example 17, the method of Example 13 may further comprise: receiving a return address store modification instruction; and perform at least one of: increment the return address buffer pointer or decrease the return address buffer pointer.
[00118] In Example 18, the method of Example 13 may further comprise: initializing the return address temporary store pointer to point to an internal return address store; modify the return address temporary storage pointer; determine that an internal return address store limit is reached; and having the return address temporary store pointer point to an external return address store.
[00119] Example 19 is an apparatus comprising a memory and a memory-coupled processing system, wherein the processing system is configured to perform the method of any one of Examples 13 to 18.
[00120] Example 20 is a non-transient computer-readable storage medium comprising executable instructions that, when executed by a processing system, cause the processing system to perform operations comprising: modifying a stack pointer; modify a return address temporary storage pointer; receive a return instruction; comparing a first return address indicated by the stack pointer to a second return address indicated by a return address temporary storage pointer; and executing the return instruction in response to the determination that the first return address is equal to the second return address.
[00121] In Example 21, the computer-readable non-transient storage media of Example 20 may additionally comprise executable instructions that cause the computing system to generate a stack fault exception in response to determining that the first address of return address differs from the second return address.
[00122] In Example 22, the computer-readable non-transient storage media of Example 20 may additionally comprise executable instructions that cause the computing system to store a return address on the stack and in the return address store in response to receiving of a calling instruction.
[00123] In Example 23, the computer-readable non-transient storage media of Example 20 may additionally comprise executable instructions that cause the computer system to receive a return address storage modification instruction and perform at least one of : store a return address in the return address store or remove a return address from the return address store.
[00124] In Example 24, the computer-readable non-transient storage media of Example 20 may additionally comprise executable instructions that cause the computer system to receive a return address storage modification instruction and perform at least one of: increment the return address temporary storage pointer. or decrease the return address temporary storage pointer.
[00125] In Example 25, the computer-readable non-transient storage media of Example 20 may additionally comprise executable instructions that induce the computing system to initialize the return address buffer pointer to point to an address storage internal return; modify the return address temporary storage pointer; determine that an internal return address store limit is reached; and having the return address temporary store pointer point to an external return address store.
[00126] Some portions of the detailed description are presented in terms of algorithms and symbolic representations and operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by persons skilled in data processing techniques to more effectively convey the substance of their work to others skilled in the art. In the present context and in general, an algorithm is conveyed to be a self-consistent sequence of operations that lead to a desired result. Operations are those that require physical manipulations of physical quantities. Often, though not necessarily, these quantities take the form of electrical and magnetic signals that can be stored, transferred, combined, compared, and otherwise manipulated. Reference to these signs as bits, values, elements, symbols, characters, terms, numbers, or the like has proved to be convenient at times, mainly for reasons of common usage.
[00127] However, it should be noted that all such terms and similar terms must be associated with the appropriate physical quantities and are merely convenient identifications applied to those quantities. Unless specifically stated to the contrary, as evident from the above discussion, it should be noted that throughout the description, discussions that use terms, for example, "encrypt", "decrypt", "store", "provide", "derive", "obtain", "receive", "authenticate", "erase", "execute", "request", "communicate", or the like, refer to the actions and processes of a computer system, or to a similar electronic computing device that manipulates and transforms data represented as physical quantities (e.g. electronics) within the computer system's records and memories into other data represented similarly as physical quantities within the computing system's memories and records, or other devices for storing, transmitting or displaying information.
[00128] The words "example" or "exemplary" are used in this document to mean that they serve as an example, occurrence or illustration. Any aspect or design described herein as "example" or "exemplary" is not necessarily interpreted as preferential or advantageous over other aspects or designs. Preferably, the use of the words "example" or "exemplary" is intended to present concepts in a concrete manner. As used in the present application, the term "or" is intended to mean an inclusive "or" rather than an exclusive "or". That is, unless otherwise specified, or is clear from the context, "X includes A or B" must mean any inclusive natural permutation. That is, if X includes A; X includes B; or X includes both A and B, so "X includes A or B" is satisfied under the above examples. In addition, the articles "a" and "an" as used in the present application and the appended claims shall be interpreted generally to mean "one or more" unless otherwise specified or clear from of the context which must mean a singular form. Furthermore, the use of the term "one modality" or "one modality" or "one implementation" or "one (1) implementation" throughout this report shall not mean the same modality or implementation unless described as such. Furthermore, the terms "first", "second", "third", "fourth", etc., as used herein, shall mean identifications to distinguish between different elements and may not necessarily have a common meaning according to their numerical designation.
[00129] The modalities described in this document may also relate to an apparatus to perform the operations in this document. This device may be specially interpreted for the purposes required, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored on the computer. Such computer program may be stored on computer-readable, non-transient storage media, for example, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs and magneto-optical disks, read-only memories (ROMs) ), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, flash memory or any type of media suitable for storing electronic instructions. The term "computer-readable storage media" shall be construed as inclusive of single media or multiple media (e.g., a centralized or distributed database and/or associated caches and servers) that store the one or more sets of instructions. The term "computer-readable media" shall also be considered to include any media capable of storing, encoding, or transporting a set of instructions for execution by the machine and which causes the machine to perform any one or more of the methodologies in the present modalities. Accordingly, the term "computer-readable storage media" should be considered to include, but not be limited to, solid-state memories, optical media, magnetic media, any media capable of storing a set of instructions for execution by the machine and which with the machine to perform any one or more of the methodologies of the present modalities.
[00130] The algorithms and displays presented here are not inherently related to any specific computer or other device. Unless otherwise indicated, various general purpose systems may be used with programs in accordance with the teachings of the present invention, or it may be convenient to build more specialized apparatus to perform the required method. The structure required for many of these systems will be evident from the description below. Furthermore, the present embodiments are not described with reference to any particular programming language. It will be apparent that a variety of programming languages can be used to implement the teachings of the modalities as described herein.
[00131] The above description sets out numerous specific details such as examples of specific systems, components, methods and so on in order to provide a clear understanding of various modalities. However, it will be apparent to a person skilled in the art that at least some modalities can be practiced without these specific details. In other examples, well-known components or methods are not described in detail or are presented in a simple block diagram format in order to avoid making the present embodiments unnecessarily complex. Accordingly, the specific details set forth above are merely exemplary. Particular implementations may vary from these exemplary details and may still be contemplated as within the scope of the present embodiments.
[00132] It is understood that the above description is intended to be illustrative and not restrictive. Many other embodiments will be apparent to those skilled in the art upon reading and understanding the above descriptions. The scope of the present embodiments should therefore be determined with reference to the appended claims, together with the full scope of equivalents to which such claims are assigned.

权利要求:
Claims (10)
[0001]
1. A processing system (102) comprising: a stack pointer configured to reference a first return address stored in a stack; a return address temporary storage pointer (421) configured to reference a second return address stored in a return address storage (423); is characterized by further comprising: a return address check logic (150) configured to: in response to the detection of a dynamically computed address being pushed onto the stack by an instruction other than a calling instruction, store (720) the address dynamically computed in the return address temporary storage; in response to receipt of a return instruction, compare the first return address to the second return address; and in response to determining that the first return address is the same as the second return address, execute the return instruction.
[0002]
2. Processing system according to claim 1, characterized in that the return address verification logic is additionally configured to generate a stack fault exception in response to the determination that the first return address differs from the first. second return address.
[0003]
3. Processing system according to claim 1, characterized in that the return address verification logic is additionally configured, in response to receipt of a return address store modification instruction, to perform at least one of: increment the return address buffer pointer or decrease the return address buffer pointer.
[0004]
4. Processing system, according to any one of claims 1 to 3, characterized in that the stack resides within a memory communicatively coupled to the processing system.
[0005]
5. Processing system according to any one of claims 1 to 3, characterized in that the return address temporary storage resides, at least partially, within a memory communicatively coupled to the processing system.
[0006]
6. Processing system according to claim 1, characterized in that the return address temporary storage comprises a first portion residing within a memory incorporated in the processing system and a second portion residing within a memory external.
[0007]
7. Processing system, according to claim 6, characterized in that the external memory is provided by a read-only memory.
[0008]
8. Processing system according to any one of claims 6 to 7, characterized in that the second portion is configured to operate as an overflow buffer with respect to the first portion.
[0009]
9. Method (700) comprising: in response to detection of a dynamically computed address being pushed onto the stack by an instruction other than a calling instruction, storing (720) the dynamically computed address in the return address buffer ;receive (740) a return instruction;compare (750) a first return address referenced by the stack pointer to a second return address referenced by the return address buffer pointer; and execute (760) the return instruction in response to the determination that the first return address is equal to the second return address.
[0010]
The method of claim 9, further comprising generating (770) a stack fault exception in response to the determination that the first return address differs from the second return address.

类似技术:

公开号 | 公开日 | 专利标题

BR112015029289B1|2022-01-11|SYSTEMS AND METHODS FOR VERIFICATION OF PROCEDURE RETURN ADDRESS

EP2889800B1|2017-06-28|Using authenticated manifests to enable external certification of multi-processor platforms

US9792222B2|2017-10-17|Validating virtual address translation by virtual machine monitor utilizing address validation structure to validate tentative guest physical address and aborting based on flag in extended page table requiring an expected guest physical address in the address validation structure

US9858140B2|2018-01-02|Memory corruption detection

US9501668B2|2016-11-22|Secure video ouput path

US9785576B2|2017-10-10|Hardware-assisted virtualization for implementing secure video output path

US9852301B2|2017-12-26|Creating secure channels between a protected execution environment and fixed-function endpoints

US11113217B2|2021-09-07|Delivering interrupts to user-level applications

US9652375B2|2017-05-16|Multiple chunk support for memory corruption detection architectures

US10230528B2|2019-03-12|Tree-less integrity and replay memory protection for trusted execution environment

US9705892B2|2017-07-11|Trusted time service for offline mode

US9417879B2|2016-08-16|Systems and methods for managing reconfigurable processor cores

US20170093571A1|2017-03-30|Double affine mapped s-box hardware accelerator

US9959939B2|2018-05-01|Granular cache repair

US20150178203A1|2015-06-25|Optimized write allocation for two-level memory

同族专利:

公开号 | 公开日

EP3014461A1|2016-05-04|

CN105264513A|2016-01-20|

EP3014461B1|2021-04-07|

RU2015150173A|2017-05-26|

WO2014209541A1|2014-12-31|

US9015835B2|2015-04-21|

RU2628163C2|2017-08-15|

US20140380468A1|2014-12-25|

EP3014461A4|2017-03-01|

BR112015029289A2|2017-07-25|

CN105264513B|2018-01-23|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5604877A|1994-01-04|1997-02-18|Intel Corporation|Method and apparatus for resolving return from subroutine instructions in a computer processor|

US5964868A|1996-05-15|1999-10-12|Intel Corporation|Method and apparatus for implementing a speculative return stack buffer|

US5850543A|1996-10-30|1998-12-15|Texas Instruments Incorporated|Microprocessor with speculative instruction pipelining storing a speculative register value within branch target buffer for use in speculatively executing instructions after a return|

DE19701166A1|1997-01-15|1998-07-23|Siemens Ag|Procedure for monitoring the proper execution of software programs|

US7086088B2|2002-05-15|2006-08-01|Nokia, Inc.|Preventing stack buffer overflow attacks|

JP3856737B2|2002-07-19|2006-12-13|株式会社ルネサステクノロジ|Data processing device|

US20040049666A1|2002-09-11|2004-03-11|Annavaram Murali M.|Method and apparatus for variable pop hardware return address stack|

US6996677B2|2002-11-25|2006-02-07|Nortel Networks Limited|Method and apparatus for protecting memory stacks|

US20040168078A1|2002-12-04|2004-08-26|Brodley Carla E.|Apparatus, system and method for protecting function return address|

US7287283B1|2003-09-25|2007-10-23|Symantec Corporation|Return-to-LIBC attack blocking system and method|

US20050138263A1|2003-12-23|2005-06-23|Mckeen Francis X.|Method and apparatus to retain system control when a buffer overflow attack occurs|

US20080148399A1|2006-10-18|2008-06-19|Microsoft Corporation|Protection against stack buffer overrun exploitation|

CN101241464B|2007-02-05|2010-08-18|中兴通讯股份有限公司|Method for checking stack frame destruction|

JP2008299795A|2007-06-04|2008-12-11|Nec Electronics Corp|Branch prediction controller and method thereof|US9589133B2|2014-08-08|2017-03-07|International Business Machines Corporation|Preventing return-oriented programming exploits|

US9767272B2|2014-10-20|2017-09-19|Intel Corporation|Attack Protection for valid gadget control transfers|

CN106537405A|2014-11-26|2017-03-22|宇龙计算机通信科技有限公司|Multimedia file processing method, multimedia file processing apparatus and terminal|

US9646154B2|2014-12-12|2017-05-09|Microsoft Technology Licensing, Llc|Return oriented programmingattack protection|

US9965619B2|2015-07-13|2018-05-08|Intel Corporation|Return address overflow buffer|

US10394556B2|2015-12-20|2019-08-27|Intel Corporation|Hardware apparatuses and methods to switch shadow stack pointers|

US10430580B2|2016-02-04|2019-10-01|Intel Corporation|Processor extensions to protect stacks during ring transitions|

US10223527B2|2016-09-20|2019-03-05|International Business Machines Corporation|Protecting computer code against ROP attacks|

US10157268B2|2016-09-27|2018-12-18|Microsoft Technology Licensing, Llc|Return flow guard using control stack identified by processor register|

CN107608905B|2017-09-11|2020-05-12|杭州中天微系统有限公司|Method and device for erasing Flash data|

US10740452B2|2017-09-15|2020-08-11|Arm Limited|Call path dependent authentication|

RU2666458C1|2017-11-27|2018-09-07|Акционерное общество "МЦСТ"|Microprocessor|

US10606771B2|2018-01-22|2020-03-31|Infineon Technologies Ag|Real time stack protection|

CN112100686A|2020-08-28|2020-12-18|浙江大学|Core code pointer integrity protection method based on ARM pointer verification|

法律状态:
2020-02-18| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-07-20| B350| Update of information on the portal [chapter 15.35 patent gazette]|

2021-10-19| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2022-01-11| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 30/05/2014, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US13/924,591|2013-06-23|

US13/924,591|US9015835B2|2013-06-23|2013-06-23|Systems and methods for procedure return address verification|

PCT/US2014/040223|WO2014209541A1|2013-06-23|2014-05-30|Systems and methods for procedure return address verification|

[返回顶部]